Constrained Discounted Dynamic Programming
نویسندگان
چکیده
This paper deals with constrained optimization of Markov Decision Processes with a countable state space, compact action sets, continuous transition probabilities, and upper semi-continuous reward functions. The objective is to maximize the expected total discounted reward for one reward function, under several inequality constraints on similar criteria with other reward functions. Suppose a feasible policy exists for a problem with M constraints. We prove two results on the existence and structure of optimal policies. First, we show that there exists a randomized stationary optimal policy which requires at most M actions more than a nonrandomized stationary one. This result is known for several particular cases. Second, we prove that there exists an optimal policy which is (i) stationary (non-randomized) from some step onward, (ii) randomized Markov before this step, but the total number of actions which are added by randomization is at most M, (iii) the total number of actions that are added by nonstationarity is at most M. We also establish Pareto optimality of policies from the two classes described above for multi-criteria problems. We describe an algorithm to compute optimal policies with properties (i){(iii) for constrained problems. The policies that satisfy properties (i){(iii) have the pleasing aesthetic property that the amount of randomization they require over any trajectory is restricted by the number of constraints. In contrast, a randomized stationary policy may require an innnite number of randomizations over time.
منابع مشابه
Dynamic programming in constrained Markov decision processes
We consider a discounted Markov Decision Process (MDP) supplemented with the requirement that another discounted loss must not exceed a specified value, almost surely. We show that the problem can be reformulated as a standard MDP and solved using the Dynamic Programming approach. An example on a controlled queue is presented. In the last section, we briefly reinforce the connection of the Dyna...
متن کاملConstrained dynamic programming with two discount factors: applications and an algorithm
We consider a discrete time Markov Decision Process, where the objectives are linear combinations of standard discounted rewards, each with a diierent discount factor. We describe several applications that motivate the recent interest in these criteria. For the special case where a standard discounted cost is to be minimized, subject to a constraint on another standard discounted cost but with ...
متن کاملNon-randomized policies for constrained Markov decision processes
This paper addresses constrained Markov decision processes, with expected discounted total cost criteria, which are controlled by nonrandomized policies. A dynamic programming approach is used to construct optimal policies. The convergence of the series of finite horizon value functions to the infinite horizon value function is also shown. A simple example illustrating an application is presented.
متن کاملA New Bi-Objective Model for a Multi-Mode Resource-Constrained Project Scheduling Problem with Discounted Cash Flows and four Payment Models
The aim of a multi-mode resource-constrained project scheduling problem (MRCPSP) is to assign resource(s) with the restricted capacity to an execution mode of activities by considering relationship constraints, to achieve pre-determined objective(s). These goals vary with managers or decision makers of any organization who should determine suitable objective(s) considering organization strategi...
متن کاملDiscounted Continuous Time Markov Decision Processes: the Convex Analytic Approach
The convex analytic approach which is dual, in some sense, to dynamic programming, is useful for the investigation of multicriteria control problems. It is well known for discrete time models, and the current paper presents similar results for the continuous time case. Namely, we define and study the space of occupation measures, and apply the abstract convex analysis to the study of constraine...
متن کاملConstrained Markovian decision processes: the dynamic programming approach
We consider semicontinuous controlled Markov models in discrete time with total expected losses. Only control strategies which meet a set of given constraint inequalities are admissible. One has to build an optimal admissible strategy. The main result consists in the constructive development of optimal strategy with the help of the dynamic programming method. The model studied covers the case o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Math. Oper. Res.
دوره 21 شماره
صفحات -
تاریخ انتشار 1996